Group Number : 12¶

Name : Rutvik D Gadhiya , Harsh Sanjay Shah, Vrushabhkumar Shrimali¶

BT4CIPGEMRLKFF7KRCCML2FWKA.avif

Kaggle link : https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata/

We're thrilled to share an exciting dataset about Airbnb rentals in New York City, USA, spanning from 2003 to 2022. This dataset is a treasure trove of insights into one of the most dynamic cities on the planet.

After thorough cleaning, we've narrowed it down to approximately 85,000 records across 19 columns. In this dataset, we'll explore:

Where are these Airbnb listings situated, and which neighborhoods are the hottest destinations?

How do prices vary depending on the type of accommodation and location?

What are the secrets behind a host's ratings and reviews?

Let's begin!

In [151]:
# Importing libraries for data analysis and visualization

# NumPy: Library for numerical computations and array manipulation
import numpy as np

# Pandas: Data manipulation and analysis library
import pandas as pd

# Seaborn: Data visualization library built on Matplotlib
import seaborn as sns

# Matplotlib.pyplot: Submodule of Matplotlib for creating plots
import matplotlib.pyplot as plt

# Plotly.graph_objs: Part of the Plotly library for interactive plots
import plotly.graph_objs as go

# Plotly.express: High-level interface for interactive visualizations using Plotly
import plotly.express as px

# os: Module for interacting with the operating system
import os

# WordCloud: Library for creating word clouds from text data
from wordcloud import WordCloud

# plotly.subplots.make_subplots: Function for creating subplots within a single Plotly figure
from plotly.subplots import make_subplots

# Jupyter Notebook magic command for displaying Matplotlib plots inline
%matplotlib inline
In [152]:
# Read data from the CSV file into the 'aib' DataFrame
aib = pd.read_csv("Airbnb_Open_Data.csv")

# Create a new DataFrame 'aib1' with a column 'house_rules' from 'aib'
aib1 = pd.DataFrame()
aib1["house_rules"] = aib["house_rules"]
/var/folders/sh/dbwfh3gx18n7xjl99467b4_m0000gn/T/ipykernel_3275/311372415.py:2: DtypeWarning:

Columns (25) have mixed types. Specify dtype option on import or set low_memory=False.

In [153]:
#To display the first 10 rows of the aib DataFrame
aib.head(10)
Out[153]:
id NAME host id host_identity_verified host name neighbourhood group neighbourhood lat long country ... service fee minimum nights number of reviews last review reviews per month review rate number calculated host listings count availability 365 house_rules license
0 1001254 Clean & quiet apt home by the park 80014485718 unconfirmed Madaline Brooklyn Kensington 40.64749 -73.97237 United States ... $193 10.0 9.0 10/19/2021 0.21 4.0 6.0 286.0 Clean up and treat the home the way you'd like... NaN
1 1002102 Skylit Midtown Castle 52335172823 verified Jenna Manhattan Midtown 40.75362 -73.98377 United States ... $28 30.0 45.0 5/21/2022 0.38 4.0 2.0 228.0 Pet friendly but please confirm with me if the... NaN
2 1002403 THE VILLAGE OF HARLEM....NEW YORK ! 78829239556 NaN Elise Manhattan Harlem 40.80902 -73.94190 United States ... $124 3.0 0.0 NaN NaN 5.0 1.0 352.0 I encourage you to use my kitchen, cooking and... NaN
3 1002755 NaN 85098326012 unconfirmed Garry Brooklyn Clinton Hill 40.68514 -73.95976 United States ... $74 30.0 270.0 7/5/2019 4.64 4.0 1.0 322.0 NaN NaN
4 1003689 Entire Apt: Spacious Studio/Loft by central park 92037596077 verified Lyndon Manhattan East Harlem 40.79851 -73.94399 United States ... $41 10.0 9.0 11/19/2018 0.10 3.0 1.0 289.0 Please no smoking in the house, porch or on th... NaN
5 1004098 Large Cozy 1 BR Apartment In Midtown East 45498551794 verified Michelle Manhattan Murray Hill 40.74767 -73.97500 United States ... $115 3.0 74.0 6/22/2019 0.59 3.0 1.0 374.0 No smoking, please, and no drugs. NaN
6 1004650 BlissArtsSpace! 61300605564 NaN Alberta Brooklyn Bedford-Stuyvesant 40.68688 -73.95596 United States ... $14 45.0 49.0 10/5/2017 0.40 5.0 1.0 224.0 Please no shoes in the house so bring slippers... NaN
7 1005202 BlissArtsSpace! 90821839709 unconfirmed Emma Brooklyn Bedford-Stuyvesant 40.68688 -73.95596 United States ... $212 45.0 49.0 10/5/2017 0.40 5.0 1.0 219.0 House Guidelines for our BnB We are delighted ... NaN
8 1005754 Large Furnished Room Near B'way 79384379533 verified Evelyn Manhattan Hell's Kitchen 40.76489 -73.98493 United States ... $204 2.0 430.0 6/24/2019 3.47 3.0 1.0 180.0 - Please clean up after yourself when using th... NaN
9 1006307 Cozy Clean Guest Room - Family Apt 75527839483 unconfirmed Carl Manhattan Upper West Side 40.80178 -73.96723 United States ... $58 2.0 118.0 7/21/2017 0.99 5.0 1.0 375.0 NO SMOKING OR PETS ANYWHERE ON THE PROPERTY 1.... NaN

10 rows × 26 columns

In [154]:
#Total data in the dataset.
len(aib)
Out[154]:
102599
In [155]:
#count the nan values
nan_counts = aib.isna().sum()
nan_counts
Out[155]:
id                                     0
NAME                                 250
host id                                0
host_identity_verified               289
host name                            406
neighbourhood group                   29
neighbourhood                         16
lat                                    8
long                                   8
country                              532
country code                         131
instant_bookable                     105
cancellation_policy                   76
room type                              0
Construction year                    214
price                                247
service fee                          273
minimum nights                       409
number of reviews                    183
last review                        15893
reviews per month                  15879
review rate number                   326
calculated host listings count       319
availability 365                     448
house_rules                        52131
license                           102597
dtype: int64
In [156]:
#Dropping the useless columns
columns_to_drop = ['license', 'country', 'country code','last review', 'host id','house_rules', 'reviews per month']
aib.drop(columns=columns_to_drop, axis=1, inplace=True)
In [157]:
#We neeed to delete nan values from the dataset.
aib = aib.dropna()
In [158]:
#Checking data types of the data.
aib.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 99502 entries, 0 to 102598
Data columns (total 19 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              99502 non-null  int64  
 1   NAME                            99502 non-null  object 
 2   host_identity_verified          99502 non-null  object 
 3   host name                       99502 non-null  object 
 4   neighbourhood group             99502 non-null  object 
 5   neighbourhood                   99502 non-null  object 
 6   lat                             99502 non-null  float64
 7   long                            99502 non-null  float64
 8   instant_bookable                99502 non-null  object 
 9   cancellation_policy             99502 non-null  object 
 10  room type                       99502 non-null  object 
 11  Construction year               99502 non-null  float64
 12  price                           99502 non-null  object 
 13  service fee                     99502 non-null  object 
 14  minimum nights                  99502 non-null  float64
 15  number of reviews               99502 non-null  float64
 16  review rate number              99502 non-null  float64
 17  calculated host listings count  99502 non-null  float64
 18  availability 365                99502 non-null  float64
dtypes: float64(8), int64(1), object(10)
memory usage: 15.2+ MB
In [159]:
# Convert the 'Construction year' column to string
aib['Construction year'] = aib['Construction year'].astype(str)

# Extract the numeric part and convert it to integers
aib['Construction year'] = aib['Construction year'].str.extract(r'(\d+)').astype(int)
In [160]:
#Changing the all values in the neighbourhood group column to the lower case.
aib['neighbourhood group'] = aib['neighbourhood group'].str.lower()
In [161]:
#changing the spelling mistakes
aib['neighbourhood group'] = aib['neighbourhood group'].str.replace('brooklyn', 'brookln')
In [162]:
# find and replace the dollar sign with empty string
aib['price'] = aib['price'].str.replace(r'\$', '', regex=True) 
#replace the commas sign with empty string
aib['price'] = aib['price'].str.replace(r',', '', regex=True)  
#changing the data type, object to int
aib['price'] = pd.to_numeric(aib['price'])  
In [163]:
#Replacing the dollor sign, commas and spaces from the service fee column.
aib['service fee'] = aib['service fee'].str.replace(r'\$', '', regex=True)  
aib['service fee'] = aib['service fee'].str.replace(r',', '', regex=True)  
aib['service fee'] = pd.to_numeric(aib['service fee'])  
In [164]:
#we are replacing values in availability 365, from <0 to 0.
negative_values_count = aib[aib['availability 365'] < 0]['availability 365'].value_counts().sum()
aib = aib[aib['availability 365'] <= 365]
In [165]:
# Convert 'minimum nights' column to integer data type
aib['minimum nights'] = aib['minimum nights'].astype(int)

# Convert 'number of reviews' column to integer data type
aib['number of reviews'] = aib['number of reviews'].astype(int)

# Convert 'review rate number' column to integer data type
aib['review rate number'] = aib['review rate number'].astype(int)

# Convert 'availability 365' column to integer data type
aib['availability 365'] = aib['availability 365'].astype(int)

# Convert 'calculated host listings count' column to integer data type
aib['calculated host listings count'] = aib['calculated host listings count'].astype(int)

Percentage and Count of Airbnb Accomodation by room type¶

In [166]:
# Count the number of listings ('id') in each 'room type' category
# Sort the listing counts in descending order to show the most common room types first
aib.groupby('room type')['id'].count().sort_values(ascending=False)
Out[166]:
room type
Entire home/apt    50668
Private room       43940
Shared room         2119
Hotel room           112
Name: id, dtype: int64

The most common room types are 'Entire home/apt' and 'Private room', which have a sum of 99,735 and take up 97.8% of the total. 'Shared room' and 'Hotel room' account for only 2.3%, with 'Hotel rooms' having just 115 rooms listed, or 0.1% of the total room type.

In [167]:
# Count the number of occurrences of each 'Construction year' and reset the index
year_counts = aib['Construction year'].value_counts().reset_index()

# Rename the columns to 'Construction year' and 'Count'
year_counts.columns = ['Construction year', 'Count']

# Sort the year counts by 'Construction year'
year_counts = year_counts.sort_values(by='Construction year')

# Create a line plot using Plotly Express (imported as 'px')
fig = px.line(
    year_counts,
    x='Construction year',
    y='Count',
    title='Construction year vs. building constructed per year',  # Set the title of the plot
    markers=True,  # Show markers at data points
    hover_name='Construction year',  # Display 'Construction year' when hovering over data points
    hover_data={'Count': True},  # Show the count in the hover tooltip
    color_discrete_sequence=px.colors.qualitative.Set1,  # Define a color scheme
)

# Increase the size of the markers on the line plot
fig.update_traces(marker=dict(size=10))

# Update the layout of the plot
fig.update_layout(
    plot_bgcolor='white',  # Set the background color of the plot
    paper_bgcolor='white',  # Set the background color of the entire plot area
    title_font=dict(size=24, color='black'),  # Customize the title font size and color
    title_x=0.5,  # Center the title
)

# Customize the hover tooltip format
fig.update_traces(
    hovertemplate="Construction year: %{x}<br>Count: %{y:.0f} <extra></extra>"
)

# Customize the x-axis tick values and labels
fig.update_xaxes(
    tickmode='array',  # Use a fixed set of tick values
    tickvals=[year for year in range(min(year_counts['Construction year']), max(year_counts['Construction year']) + 1, 2)],  # Set the tick values
    ticktext=[year for year in range(min(year_counts['Construction year']), max(year_counts['Construction year']) + 1, 2)]  # Set the tick labels
)

# Display the plot
fig.show()

The line graph shows the average Airbnb prices in the United States from 2003 to 2022. In 2008, the average price was 639, which was the highest price during the 20-year span. In contrast, the lowest average price was 614 in 2019.

Avg listing price by neighbourhood¶

In [168]:
# Define custom colors for the bar chart
neighbor_palette = ['#FF5733', '#FFC300', '#DAF7A6']

# Calculate the average prices for each combination of 'neighbourhood group' and 'room type' and reset the index
avg_prices = aib.groupby(['neighbourhood group', 'room type'])['price'].mean().reset_index()

# Get unique room types
room_types = avg_prices['room type'].unique()

# Create a subplot with 2 rows and 2 columns, and set the figure size
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(9, 8))

# Iterate over each room type and create a bar chart for each
for i, room_type in enumerate(room_types):
    ax = axes[i // 2, i % 2]  # Select the current subplot
    subset = avg_prices[avg_prices['room type'] == room_type]  # Subset data for the current room type
    
    # Create a bar chart with neighborhood group on the x-axis and average price on the y-axis
    ax.bar(subset['neighbourhood group'], subset['price'], color=neighbor_palette)
    
    # Set the title, x-axis label, y-axis label, and add a horizontal grid
    ax.set_title(f'Room Type: {room_type}')
    ax.set_xlabel('Neighborhood')
    ax.set_ylabel('Avg Listings Prices')
    ax.grid(axis='y', linestyle='--', alpha=0.7)  # Add a horizontal grid with dashed lines
    ax.set_axisbelow(True)  # Ensure the grid is behind the bars

# Adjust the layout of the subplots for better spacing
plt.tight_layout()

# Set the main title for the entire figure
plt.suptitle('Avg Listings Prices by Neighborhood and Room Type', fontsize=16)

# Adjust the position of the main title to avoid overlap with subplots
plt.subplots_adjust(top=0.9)

# Display the plot
plt.show()

The average nightly Airbnb prices in New York City vary depending on the neighborhood group and the type of accommodation. In the 'Entire Home/Apt' category, Staten Island stands out with the highest average price at around 625, while other neighborhoods maintain an average price of approximately 610. For 'Shared Room' accommodations, Staten Island continues to lead with an average price of about 710. Conversely, in the 'Private Room' category, Staten Island offers the lowest average price, which hovers around 600. Finally, in the 'Hotel Room' category, with only three neighborhoods listed in the Airbnb dataset, Brooklyn takes the lead with the highest average price at approximately 720.

In [169]:
sorted_price = aib.sort_values('price', ascending=False)
top_10 = sorted_price.head(5)
top_10
Out[169]:
id NAME host_identity_verified host name neighbourhood group neighbourhood lat long instant_bookable cancellation_policy room type Construction year price service fee minimum nights number of reviews review rate number calculated host listings count availability 365
5207 3877162 Bushwick Room w/ Private Entrance & Bathroom! unconfirmed Julie brookln Bushwick 40.70322 -73.92913 True strict Private room 2020 1200 240 1 16 1 5 30
20343 12236775 Lovely apartment in Williamsburg verified Harry brookln Greenpoint 40.72253 -73.94350 True flexible Private room 2020 1200 240 7 6 2 1 62
17080 10434620 West 50th Street, Luxury Svcd Studio Apt verified Ken manhattan Hell's Kitchen 40.76294 -73.98574 True flexible Entire home/apt 2009 1200 240 30 1 4 87 329
75053 42453108 Cozy room in bright, spacious apartment unconfirmed Steven bronx Hunts Point 40.81731 -73.89052 False moderate Private room 2003 1200 240 21 0 2 4 341
50535 28911817 Stylish Petite Private Room in Brooklyn verified Shana brookln Bedford-Stuyvesant 40.67842 -73.91024 False moderate Private room 2020 1200 240 2 24 2 1 365

Bar chart of top ten neighborhoods price comparison.¶

Create a horizontal bar chart to display the top 10 most expensive neighborhoods in the dataset. Create another chart with the 10 cheapest neighborhoods in the dataset. Create a box and whisker chart that showcases the price distribution of all listings split by room type.

In [170]:
# Create subplots with two rows and one column
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)

# Data for the most expensive neighborhoods chart
top_neighbourhoods_expensive = aib.groupby('neighbourhood')['price'].median().nlargest(10).reset_index()
top_neighbourhoods_expensive = top_neighbourhoods_expensive.sort_values(by='price', ascending=True)

# Data for the least expensive neighborhoods chart
top_neighbourhoods_cheap = aib.groupby('neighbourhood')['price'].median().nsmallest(10).reset_index()
top_neighbourhoods_cheap = top_neighbourhoods_cheap.sort_values(by='price', ascending=False)

# Create the first subplot (most expensive neighborhoods)
trace1 = go.Bar(
    x=top_neighbourhoods_expensive['price'],
    y=top_neighbourhoods_expensive['neighbourhood'],
    orientation='h',
    marker=dict(color='skyblue'),
    name='Most Expensive'
)

# Create the second subplot (least expensive neighborhoods)
trace2 = go.Bar(
    x=top_neighbourhoods_cheap['price'],
    y=top_neighbourhoods_cheap['neighbourhood'],
    orientation='h',
    marker=dict(color='lightgreen'),
    name='Least Expensive'
)

# Add the traces to the subplots
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=2, col=1)

# Update layout for both subplots
fig.update_layout(
    title_text="Top Ten Neighborhoods Comparison",
    title_x=0.5,
    showlegend=True,  # Show legends
    legend=dict(x=1, y=0.5),  # Adjust legend position
    plot_bgcolor='white',
    paper_bgcolor='white',
    height=800,
)

# Update labels for both subplots
fig.update_xaxes(title_text="Price [$]", row=2, col=1)
fig.update_yaxes(title_text=".                                                                      Neighbourhood\n\n\n", row=2, col=1)
 
# Show the combined subplot
fig.show()

The horizontal bar plot illustrates the average Airbnb price per night in different neighborhood groups across New York City. Notably, New Drop had the highest average Airbnb price per night, which was approximately 1,044, followed closely by Staten Island with the second-highest average price at 1,042. On the other hand, Lighthouse Hills had the lowest average Airbnb price per night at just 127. These substantial variations in nightly average prices within each neighborhood can be attributed to several factors, including the quality of Airbnb rentals, proximity to local attractions, and seasonal fluctuations.

Neighbourhood group in Map¶

In [171]:
# Create a scatter map using Plotly Express
fig = px.scatter_mapbox(aib,  # DataFrame containing data
           lat="lat",  # Latitude column
           lon="long",  # Longitude column
           opacity=0.3,  # Set marker opacity
           hover_name="neighbourhood group",  # Show 'neighbourhood group' when hovering
           hover_data=["neighbourhood group", "price"],  # Additional data to display on hover
           color="price",  # Color markers based on 'price' column
           color_continuous_scale='Viridis_r',  # Choose color scale
           title="Price comparing in the map",  # Set the title of the plot
           template="plotly",  # Choose the plot template
           zoom=10  # Set the initial zoom level
           )

# Increase the size of the plot and customize other layout options
fig.update_layout(
    mapbox_style="open-street-map",  # Choose mapbox style
    margin={"r": 10, "t": 50, "l": 10, "b": 10},  # Set plot margin
    font=dict(size=17, family="Franklin Gothic"),  # Customize font
    height=600  # Set the height of the plot
)

# Add interactivity - zoom, pan, and reset buttons
fig.update_mapboxes(
    zoom=10,  # Set the initial zoom level
)
fig.update_geos(
    projection_type="mercator",  # Use mercator projection for better interactivity
    showcoastlines=True,  # Show coastlines on the map
)

# Further layout adjustments
fig.update_layout(
    mapbox_style="open-street-map",  # Choose mapbox style
    margin={"r": 8, "t": 54, "l": 8, "b": 10},  # Set plot margin
    font=dict(size=17, family="Franklin Gothic"),  # Customize font
    height=600,  # Set the height of the plot
    title="Price comparing in the map",  # Set the title of the plot
    title_x=0.5,  # Set title's x position to the center
    title_y=0.95  # Set title's y position to the top
)

# Display the plot
fig.show()

The Airbnb dataset price map visually highlights the pricing distribution of Airbnb rentals in New York City, with Manhattan had the lowest prices and Staten Island offering the highest. On the map, darker colors indicate higher prices, while lighter colors represent lower prices. Additionally, the map displays the latitude and longitude coordinates, neighborhood information, and pricing data for each individual Airbnb listing, providing a comprehensive overview of the rental landscape.

Word cloud¶

In [172]:
# Clean and preprocess the 'house_rules' column
aib1['house_rules'] = aib1['house_rules'].fillna('')  # Replace NaN values with empty strings
house_rules_text = " ".join(aib1['house_rules'].astype(str))

# Create a WordCloud object with a larger size
wordcloud = WordCloud(width=1000, height=600, background_color='white').generate(house_rules_text)

# Create a centered figure
plt.figure(figsize=(12, 7))

# Calculate center position
x_centered = (plt.gca().get_xlim()[1] - plt.gca().get_xlim()[0]) / 2.0
y_centered = (plt.gca().get_ylim()[1] - plt.gca().get_ylim()[0]) / 2.0

# Display the Word Cloud in the center
plt.imshow(wordcloud, interpolation='bilinear', extent=[x_centered - 500, x_centered + 500, y_centered - 300, y_centered + 300])
plt.axis('off')
plt.title('House Rules')
plt.show()

Airbnb listings typically focus on providing guests with a comfortable and convenient place to stay, with specific amenities that are important to guests. The most common words in Airbnb listings describe the type of property, the smoking room,checking time, house, pet and the amenities. Airbnb hosts can use the word map to identify the most important words and phrases to include in their listings.

Donut chart¶

In [173]:
review_rate_counts = aib['review rate number'].value_counts()

# Define the slice to explode (1 rating in this case)
explode_slice = "1"  # You may need to convert it to a string if it's not already

# Create a custom color palette for the slices
colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']

# Create a list to specify the pull amount for each slice
explode_values = [0.1 if label == explode_slice else 0 for label in review_rate_counts.index]

# Create a pie chart using Plotly with a white border and exploded slice
fig = go.Figure(data=[go.Pie(
    labels=review_rate_counts.index,
    values=review_rate_counts,
    marker=dict(
        line=dict(color='white', width=2),  # Set white border
        colors=colors  # Apply the custom color palette
    ),
    hole=0.4,  # Adjust the size of the center hole if desired
    domain={"x": [0, 0.5]},  # Adjust the domain to explode the slice
)])

# Customize the layout
fig.update_layout(
    title_text="Distribution of Review Rates",
    title_x=0.17,  # Center the title horizontally
    legend=dict(x=0.5, y=0.3),  # Adjust the legend position (x and y coordinates)
)

# Show the pie chart
fig.show()

A doughnut chart visualizes the distribution of customer review ratings, revealing that ratings from 2 to 5 are fairly consistent, each accounting for approximately 22.7% of the total. In contrast, reviews with a rating of 1 constitute a smaller percentage, at only 8.69%. While the majority of customers provide positive ratings, there is room for improvement, as a noteworthy portion of clients is leaving lower ratings.

In [174]:
review_rate_per_neighbourhood_group = aib.groupby('neighbourhood group')['review rate number'].mean()
In [175]:
availability_per_neighbourhood_group = aib.groupby('neighbourhood group')['availability 365'].mean()

AVG rate and Availability per Neighbourhood Group¶

In [176]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Assuming you have DataFrames named 'review_rate_per_neighbourhood_group' and 'availability_per_neighbourhood_group'
# Replace these with your actual data
# Both DataFrames should have the same index (neighborhood group names)

# Create subplots with 1 row and 2 columns
fig = make_subplots(rows=1, cols=2, subplot_titles=("Average Review Rate", "Average Availability"))

# Add the first bar plot for average review rate
trace1 = go.Bar(
    x=review_rate_per_neighbourhood_group.index,
    y=review_rate_per_neighbourhood_group.values,
    text=[str(round(i, 2)) for i in review_rate_per_neighbourhood_group.values],
    marker=dict(color=px.colors.sequential.algae),
    name="Review Rate"
)

# Add the second bar plot for average availability
trace2 = go.Bar(
    x=availability_per_neighbourhood_group.index,
    y=availability_per_neighbourhood_group.values,
    text=[str(round(i)) for i in availability_per_neighbourhood_group.values],
    marker=dict(color=px.colors.sequential.deep),
    name="Availability"
)

# Add the traces to the subplots
fig.add_trace(trace1, row=1, col=1)
fig.add_trace(trace2, row=1, col=2)

# Update the layout for the entire figure
fig.update_layout(
    title_text="Average Review Rate and Availability per Neighbourhood Group",
    font=dict(size=20, color='white', family='Avenir'),
    template='plotly_dark',
    showlegend=False  # Hide the legend as it's not needed in this case
)

# Show the merged plot
fig.show()

The bar plot displays the average review rate and availability per neighborhood group in New York City. We can observe that Staten Island had the highest average review rate and availability in New York City, with a score of 3.39 out of 5 and 195 days of availability out of 365. On the left side, we can see that Brooklyn had the lowest average review rate at 3.27 and also the lowest availability, with only 122 out of 365 days. Queens and the Bronx, on the other hand, had similar review rates, averaging around 3.33 and 3.34, respectively. On the right side, we can see that Queens and the Bronx had similar availability, both around 158 and 177 days.

Parallel Categorize Plot¶

In [177]:
#I will categorize the price
aib["price"].median()
Out[177]:
625.0
In [178]:
# Define a function 'cato_price' that categorizes prices
def cato_price(p):
    if p > 600:
        return "More than 600"
    else:
        return "less than 600"

# Apply the 'cato_price' function to each value in the 'price' column
aib["cat_price"] = aib["price"].apply(cato_price)
In [179]:
review_rate_per_neighbourhood_group = aib.groupby('neighbourhood group')['review rate number'].mean()
In [180]:
availability_per_neighbourhood_group = aib.groupby('neighbourhood group')['availability 365'].mean()
In [181]:
# Define the columns to be used for the parallel categories plot
labels = {"host_identity_verified": "Host Identity Verified", 
          "neighbourhood group": "Neighbourhood Group", 
          "room type": "Room Type", 
          "cat_price": "Price"}

# Create a parallel categories plot using Plotly Express
fig = px.parallel_categories(
    aib,
    dimensions=["host_identity_verified", "neighbourhood group", "room type", "cat_price"],  # Specify the columns to be used
    labels=labels  # Rename column labels for the plot
)

# Increase the figure size
fig.update_layout(
    width=1000,  # Set the width of the plot
    height=400  # Set the height of the plot
)

# Display the plot
fig.show()

In a parallel categories (or parallel sets) plot, each row of the data frame is grouped with other rows that share the same values of dimensions and then plotted as a polyline mark through a set of parallel axes, one for each of the dimensions. There are four axes in the graph: "Host Identity Verified," "Neighbourhood Group," "Room Type," and "Price." Each axis has two to six categorical values. When examining a specific combination on the graph, it displays the count of values, which means it shows the number of available room types in a particular neighborhood. Additionally, it indicates whether the rooms are verified or unverified by host identity and also provides price information and we can also see the counts of different categories like how many airbnb in brokln neighbourhood or how many private room was available.

Link to the data source :- https://www.kaggle.com/datasets/arianazmoudeh/airbnbopendata

Conclusion:¶

In our exploration of the Airbnb dataset for New York City, we've uncovered some intriguing insights:

Popular Destination: Manhattan emerges as the hottest Airbnb destination, boasting the highest number of listings. With a balanced mix of room types, reasonable prices, and favorable ratings, it's a prime choice for travelers.

Price Variations: Prices vary by location and accommodation type and avaibilities. Manhattan leads in terms of apartment and staten island shared room prices, while the Bronx offers the most affordable private rooms. For hotel rooms, Brooklyn takes the lead.

Host Rating Secrets: The overall average rating hovers around 3.34, with Staten Island standing out at 3.39. It appears that ratings are influenced by price, neighbourhood (with best views) and house rules such as (smoking and pet policies) indicating the importance of clear guidelines for guests.

Our journey through this dataset has shed light on the dynamics of Airbnb in New York City, revealing valuable insights for both travelers and hosts.

Thank you for joining us on this data-driven adventure!

In [ ]: